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Abstract 

We consider two types of Go models of a protein (crambin) and 
study their kinetics through molecular dynamics simulations. In the 
first model, the residue - residue contact interactions are selected 
based on a cutoff distance, R c , between the C a atoms. The fold- 
ing times depend on the value of R c strongly and non-monotonically 
due to the interplay between frustration and the free energy barrier for 
folding. This indicates a need for a physically determined set of native 
contacts that takes into account all the residual atoms. This can be 
accomplished by considering the van der Waals radii of the atoms and 
checking if they are found within a proper range of the van der Waals 
attraction. In the second model, non-native attractive contacts are 
added to the system. This leads to bad foldability. However, for a 
small number of such extra contacts there is a slight acceleration in 
the kinetics of folding. 
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INTRODUCTION 



All-atom molecular dynamics (MD) simulations are not yet adequate tools 
to study kinetics of protein folding. Due to a large number of degrees of free- 
dom involved the accessible time scales in MD are orders of magnitude too 
short compared to the experimental folding times which last for a millisecond 
or longer [|IJ . Thus theoretical studies of the kinetics must involve simplified 
models such as those in which each amino acid is represented by a single 
bead that is located at a position of the C Q atom. This kind of a drastic 
simplification in turn requires a design of effective interactions between the 
beads. A determination of such interactions itself turns out not to be a sim- 
ple task if one requires preservation of the chemical identities of the amino 
acids. Go models are a class of simplified systems that do not take into ac- 
count the sequential specificity of the protein and yet have proved to be fairly 
realistic in terms of the kinetic properties of proteins |2], |3|. These models 
are built based merely on the knowledge of the experimentally determined 
native state. The interactions between the beads are made attractive if the 
pair of residues form a native contact and repulsive otherwise (to take the 
excluded volume into account). 

The selection of what pair of residues is considered as forming a native 
contact is a basic ingredient of a Go-type model. A common procedure is to 
consider the distances, r's, between the C a atoms in the native state, and 
to adopt a certain cut-off distance R c such that an r smaller than R c gives 
rise to a native contact. The value of R c has been chosen in the literature 
quite arbitrarily, between 6.5Aand 8.5A. Though in principle there should 
be some optimal choice of R c that reflects the size and structural details of 
the amino acids, the simple expectation is that the folding kinetics is not 
too sensitive to the specific value of R c within a reasonable range. In this 
paper, we examine in details the dependence of folding kinetics on R c and 
show that this dependence is actually rather strong and quite non-trivial. As 
an illustration we consider the Go-type models of crambin and find that the 
fastest folding time, as determined under optimal folding conditions, depends 
on R c non-monotonically, although the non-monotonicity is restricted to a 
rather narrow range. Generally, however, the bigger the R c , the shorter the 
folding time and the bigger the stability because more and more interactions 
favor moving into the native state. 
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A more physical way || to determine the native contacts involves taking 
all pairs of the non-hydrogen atoms in the two amino acids, assigning the van 
der Waals radii |J to them and checking if there is an overlap. The criterion 
for the overlap takes into account some softness in the potential and assigns 
a factor of 1.244 (the inflection point in the Lennard Jones potential) to the 
sum of atomic radii. This factor is again somewhat subjective but the whole 
procedure involves considering the actual sizes of the amino acids and the 
corresponding Go model will serve as a "realistic" reference system here. For 
crambin, the distribution of the resulting 137 contact distances is shown in 
Figure 1 and the structure file was taken from the PDB [[J. The contact 
distances range from 4.1 to 9.5 A. The i? c -based criterion would involve the 
van der Waals-determined contacts up to the distance of R c but also many 
additional pairs which are upgraded to contacts artificially. On comparing 
kinetics of the reference system to those corresponding to various R c s one 
can find a value for which there is a close resemblance. For crambin, we find 
that the corresponding equivalent R c is near 7.5 A. 

The next issue which we study here is how the folding kinetics are affected 
when attractive interactions are added to the non-native contacts. Recently, 
Plotkin |7J has argued that, as he put it in the title of his paper, "a little 
frustration sometimes helps" , i.e. adding some small noise to non-native in- 
teractions may actually accelerate the kinetics of folding up to several times. 
In Plotkin's model the non-native contacts are assigned with a random en- 
ergy with some small variance. The models we studied are in a similar spirit 
but in our models non-native attractions are assigned to only few non-native 
contacts while their energies are the same as those of the native ones. This 
is done by first generating the reference (van der Waals-based) Go model 
and then by randomly selecting n a extra pairs with the sequence distance 
of at least 4. These extra pairs are endowed with the repulsive potential in 
the reference model and now they acquire a potential which is attractive. 
In agreement with what proposed by Plotkin, the acceleration of the fold- 
ing kinetics is also observed in our models but we find it to be rather weak 
and it happens only when n a is small. After some threshold, the basic ef- 
fect of increasing the n a is to turn the systems into poorer and poorer folders. 
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MODELS AND METHODS 



The Hamiltonian 

An input for the construction of the Go model is a PDB file || with the 
coordinates of all atoms in the native conformation. The coordinates are 
used to determine the length related parameters of the model. 

There are many variants of the Go models depending, for instance, on 
the choice of the functional form of the attractive potentials in the native 
contacts. The 10-12 potentials (with the r~ 12 repulsion and r~ 10 attraction) 
are a common choice, see for instance || |J. Another, used here, is based 
on the Lennard Jones form. We follow the procedure as described in our 
previous papers || [H], [11], 12, 13] and especially in [14]. The Hamiltonian 



consists of the kinetic energy and of the potential energy, E p ({ri}), which is 
given by 

Ep ({ ri }) = V BB + V NAT + V NON + V CHIR . (1) 
The first term, V BB is the harmonic potential 

N-l i 

v BB = E - ^o) 2 , (2) 

1=1 A 

which tethers consecutive beads at the equilibrium bond length, do, of 3.8A. 
Here, r iji+ i = [r, — r i+ i\ is the distance between the consecutive beads and 
k = 100e/A 2 , where e is the energy scale characterizing the native contacts. 

yNAT correS p 0ric l s to the Lennard- Jones interactions in the native con- 
tacts and is given by 



NAT 



i<j 



(3) 



where the sum is taken over all native contacts and e is common to all con- 
tacts. The parameters chosen so that each contact in the native 
structure is stabilized at the minimum of the potential, and a = 5A is a typ- 
ical value. For each pair of interacting amino acids, the two potentials have 
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a minimum energy of — e. The non-native interactions, V NON , are purely 
repulsive and are necessary to reduce the effects of entanglements. They are 
taken as the repulsive part of the Lennard- Jones potential that corresponds 
to the minimum occurring at 5A. This potential is truncated at the minimum 
and shifted upward so that it reaches zero energy at the point of truncation. 

The last term in the Hamiltonian, V CHIR , favors the native sense of the 
chirality at each location along the backbone. A chirality of residue i is 
defined as 

q = (v -' * (4) 

where Vj = Ti+i — Tj. A positive Cj corresponds to right-handed chirality. 
Otherwise the chirality is left-handed. V CHIR is given phenomenologically 
0by 

N-2 I 

v CHIR = E \ K c? e(-cr T ), (5) 

where is the step function (1 for positive arguments and zero otherwise), 
qn at - g chirgjity of residue % in the native conformation, and k is taken 
to be equal to e. The role of V CHIR is primarily to punish energetically any 
deviations from the non-native sense of chirality. 

The attractive non-native contacts, when built in, are given by the Lennard 
Jones attraction with the minimum at 5 A. 

The time evolution 

The time evolution of unfolded conformations to the native state is simu- 
lated by MD as described in |], |10|. The beads representing the amino acids 
are coupled to Langevin noise and damping terms that provide thermostating 
at a temperature T. The equations of motion for each bead are 

mi = — 71" + F c + T , (6) 

where m is the mass of the amino acids represented by each bead. The 



specificity of masses has turned out to be irrelevant for kinetics [12] and it 
is sufficient to to consider masses that are uniform and equal to the average 
amino acidic mass. F c is the net force due to the molecular potentials and 
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external forces, 7 is the damping constant, and T is a Gaussian noise term 
with dispersion y/l^ksT . For both kinds of the contact potentials, time is 
measured in units of r = ^Jma 2 /e, where a is 5A. This corresponds to the 
characteristic period of undamped oscillations at the bottom of a typical 
Lennard- Jones potential. For the average amino acidic mass and e of order 
4kcal/mol, r is of order 3ps. We have found that the folding times, t/ ^ 
depend on 7 linearly so going to more realistic larger values of viscosity, as 
controlled by 7 involves simple rescaling. The equations of motion are solved 
by means of the fifth order Gear predictor-corrector algorithm [jFJ] with a 
time step of 0.005r. 



The folding time is calculated as the median first passage time, and is 
estimated based on at least 201 trajectories. T min is defined as a temperature 
at which tf Q id has a minimum value when plotted vs. T. For short proteins, 
the U-shaped dependence of tf Q id on T may be very broad and then T min is 
defined as the position of the center of the U-shaped curve. The simplified 
criterion for an arrival in the native conformation to be declared is based on 
a simplified approach in which a protein is considered folded if all beads that 
form a native contact are within the cutoff distance of 1.5aa. 



The stability temperature Tf is determined through the nearly equilib- 
rium calculation of the probability that the protein has all of its native con- 
tacts established. Tf is the temperature at which this probability crosses 
|. The calculation is based on at least 5 long trajectories that start in the 
native state. A protein is considered as a good folder if Tf is found within a 
range of temperature at which folding is relatively fast, i.e. preferably close 
to T m i n 16 . 



DEPENDENCE OF FOLDING ON R c 

In our studies of the effects of the cut-off distance R c in the Go models 
of crambin (the PDB code is lcrn), we consider the range from 6 A to 11 
A in which the number of native contacts varies between 75 and 379 (if one 
excludes the peptide bond interactions between the successive beads then 
there is a total of 990 possible contacts since the sequence length is 46). The 
6A case is borderline because it corresponds to one amino acid (the 36'th 
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along the sequence) which does not belong to any contact and the whole 
structure is stabilised by the remaining contacts. 

Figure 2 shows the median values of £/ ^ as a function of temperature, 
T. The top panel is for the reference system, considered to be the "true" 
model of crambin whereas the bottom panel is for three values of R c . In each 
case, the T-dependence has the common U-shaped form but the shape itself 
depends on R c strongly. It becomes very broad for R c at and above 8.75 A. 
This suggests that the nature of the model changes profoundly around 8.75 
A. Furthermore, as indicated by the top panel of Figure 2, R c of near 7.5 
A provides the closest, but by no means perfect, representation of the true 
model. 7.5 A can be then considered as an equivalent value but this quantity 
is protein dependent (7.5 A appears too big for the domain of titin [|T4|j ). 



It is expected that adding more and more contacts that tie the native 
structure stronger and stronger should have a dual effect: one is that the ther- 
modynamic stability should get enhanced and the other is that the kinetic 
links with the native state multiply which should accelerate the folding. Fig- 
ure 3 suggests, however, that this picture, though generally correct, is more 
subtle: the fastest folding time, as determined at T min , is non-monotonic 
around R c of 8.75 A even though the number of contacts, n nat remains mono- 
tonic throughout. This can be explained by the argument that while the free 
energy barrier for folding decreases on increasing R c the effects of frustration 
also get enhanced. The range of R c where tf u increases corresponds to the 
situation where the increment in frustration is the winning factor. 

Whatever the value of R c up tp 11 A, Tf stays in the temperature range 
corresponding to fast folding, i.e. on coolling down, the sequence acquires 
appreciable probability of staying near the native basin before the glassy ef- 
fects set in. This is seen in Figure 2 and also in Figure 4. The latter figure 
shows the values of Tf, T min , and T g2 as a function of R c . T g2 is defined as the 
temperature at which the folding time is twice as high as the fastest time, on 
the low temperature side of the U-curve. For broad U-curves, T g2 is a better 
measure of when the glassy effects set in than T m j n . Each of these quantities 
grow monotonically with R c (and become straighter when plotted vs. n na t) 
and Tf stays between (or near) T g2 and T min which indicates good foldability. 

Interestingly, Tf of the reference system agrees with a Tf obtained for R c 
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between 7 and 7.5 A whereas T m j n of the reference system is equivalent to R c 
slightly higher than 6 A which suggests that the true system cannot really 
be represented by any uniform cutoff value in the contact range. 

Figure 5 shows the specific heat, as obtained by the weighted histogram 



method fl? |, which provides an additional characterization of the equilibrium 
properties. The positions of the maxima grow with R c and so do the maxi- 
mal values. The temperature width, however, remains more or less constant. 
Note that even though R c of 7.5 A provides a reasonable fit to the plots of 
the folding time vs. T for the reference system, the specific heat is off more 
noticeably but the peak position is adequate. It should be noted that the 
peak positions in the specific heat are higher than both Tf and T min for all 
cases. The emergence of a single maximum in the specific heat is a signa- 
ture of an equilibrium folding transition when the affinity towards the native 
basin starts to dominate the free energy. The fact that Tf is smaller than the 
temperature of the specific heat's peak indicates that below the folding tran- 
sition temperature the entropy associated with the chain is still significant 
and not all of the native contacts are established. Complete folding seems 
to be more accurately associated with Tf, which is defined in reference to a 
single micro state - the native conformation. 



THE EFFECTS OF NON-NATIVE ATTRACTIVE CONTACTS 

We now consider the second model in which n a attractive non-native con- 
tacts are added to the reference Go model of crambin. We consider just one 
realization for the contact addition for each n a and a system corresponding to 
a larger value of n a incorporates contacts generated at any smaller value of n a . 

We have checked that for n a < 90 the native state does not change its 
nature and it remains accessible kinetically. Its stability, however, decreases 
with an increasing n a . Figure 6 shows the U-shaped curves of the folding 
time vs. T. In contrast to what happens in the previous model, the curves 
become narrower and narrower on adding contacts. Furthermore, Tf drifts 
towards lower and lower temperatures, eventually leaving the region of fast 
folding. For n a greater than ~ 60, the systems are bad folders. 
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The fastest folding time is shown in Figure 7 as a function of n a . It is 
interesting to note that for small values of n a there is a weak decrease in 
tfoid compared with the case of n a of 0. Though this decrease is small, it 
is in a qualitative agreement with Plotkin's argument which suggests that 
non-native interactions involved in the collapsed phase may lower the free 
energy barrier for folding 0. Beyond n a of about 25, there is a steady and 
nonlinear growth of tf a id with n a . 

Figure 8 shows Tf, T min , and T g2 and a function of n a . Their relative po- 
sitioning indicates a flow from good to bad folding properties and bad native 
stability. Tf becomes smaller than T g2 at n a of about 50. For a sufficiently 
large n a , Tf is expected to disappear. 

The specific heat for several values of n a is shown in Figure 9. In contrast 
to the picture shown in Figure 4, which also deals with a growing number of 
contacts, an increase in n a results in a decrease in the specific heat's max- 
imum and in shifting it towards lower temperatures. A remarkable change 
is observed at n a of 75 and higher, where the specific heat curve acquires a 
double humped shape. These two peaks can be interpreted as correspond- 
ing to two transitions: of folding and of collapse with the former appearing 
at the lower temperature. The emergence of these two transitions indicates 
conditions of bad foldability. Notice also that in accordance with what was 
discussed in the previous section, Tf is always smaller than the temperature 
corresponding to the peak in the specific heat (the smaller peak if there are 
two). 

CONCLUSIONS 

In this paper, we demonstrated that the selection of the proper contact 
range has a strong effect on the folding kinetics in Go models and is not at all 
innoccuous. It is thus important to have a physical scheme, such as based on 
the van der Waals radii of the atoms, that allows for an amino acid by amino 
acid determination of whether they make a native contact or not. Adding 
non-native attractive contacts leads eventually to bad foldability but adding 
just a few attractive non-native contacts slightly accelerates the folding pro- 
cess. 
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FIGURE CAPTIONS 



Figure 1. The distribution of the effective contact lengths in crambin as 
determined by the procedure which is based on the van der Waals radii of 
the atoms. The shaded region corresponds to the contacts that would not be 
included if the cutoff of 7.5 A was adopted. 

Figure 2. The dependence of the folding time on temperature for various 
Go models of crambin. The top panel is for the contacts determined through 
the criterion that involves the van der Waals radii of the residual atoms. 
The bottom panel is for contacts determined by the cutoff based criterion. 
The corresponding values of R c are indicated. The crosses on the top panel 
correspond to R c of 7.5 A. The error bars are of order of the size of the sym- 
bol representing the data points. The arrows indicate values of the folding 
temperature Tf. 

Figure 3. The folding time at T m j„ as a function of the cutoff distance 
R c . The error bars are less than double the circular symbol size. The stars 
represent the numbers of the native contacts corresponding to a given value 
of R c . 

Figure 4. The values of the characteristic temperatures T min (open squares), 
Tf (solid circles), and T g2 (triangles) and positions of the maximum in the 
specific heat (stars) for the Go models of crambin for various values of R c . 

Figure 5. The plots of specific heat as a function of T for several values 
of R c , as indicated. The solid line (denoted by VdW) corresponds to the 
reference model in which the native contacts are determined based on the 
van der Waals radii of the atoms. 

Figure 6. The dependence of the folding time on temperature for Go mod- 
els of crambin in which n a extra non-native attractive contacts are added. 
The values of n a are indicated. The arrows show values of Tf. 

Figure 7. The folding time as a function of n a . The dotted line indicates 
the value corresponding to n a =0. 
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Figure 8. Similar to Figure 4 but for the model with the added non-native 
attractive contacts. 



Figure 9. Specific heat as a function of T for the indicated values of n a . 
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